Below, I describe what I have learned about the different potential broadband availability datasources that Michele included in the notes for me, and what variables I think we might be able to include in an eventual datafile.
dat <- read.csv("VA-Fixed-Jun2020-v1.csv")
dat <- dat %>%
mutate(countycode = str_sub(BlockCode, 1, 5))
# Charlottesville area
cvlfips <- c("51540", "51003", "51065", "51079", "51109", "51125")
cvldat <- dat %>%
filter(countycode %in% cvlfips)
cvldat <- cvldat %>%
mutate(Blkgr = str_sub(BlockCode, 1, 12))
Variables in the FCC data that seem worth including/interesting:
The number of consumer internet providers in each su
The number of business internet providers in each su
The number of different holding companies providing services in each su (this may not be a big difference from the number of providers, but it may be interesting to see, for example, how often there are providers with ostensibly local-sounding names that are really just part of Verizon or some other large coorporation)
A binary variable indicating whether the su has "advanced telecommunications capability" which, based off the FCC's benchmark, this means internet speeds of 25/3 Mbps. At the block group level, this is represented at the number of block within the block group that do not meet the threshhold.
The FCC uses the following benchmarks: 50/5 Mbps, 100/10 Mbps, and 250/25 Mbps (in addition to 25/3 Mbps). I created a categorical variable for blocks meeting each of the download speed threshholds. I'm still thinking through how to represent this at the block group level.
Other considerations: The FCC has a goal of 100 Mbps per 1,000 students and staff, and 1 Gbps per 1,000 students and staff in every elementary and secondary school. If we ever plan to overlay these data with school locations, that might be something interesting to consider.
glimpse(cvldat)
## Rows: 62,590
## Columns: 17
## $ LogRecNo <int> 117519, 117520, 117521, 117522, 117523, 117524, 117…
## $ Provider_Id <int> 59682, 59682, 59682, 59682, 59682, 59682, 59682, 59…
## $ FRN <int> 6965750, 6965750, 6965750, 6965750, 6965750, 696575…
## $ ProviderName <chr> "Nelson County Cablevision Corp", "Nelson County Ca…
## $ DBAName <chr> "Nelson County Cablevision Corporation", "Nelson Co…
## $ HoldingCompanyName <chr> "Nelson County Cablevision Corp.", "Nelson County C…
## $ HocoNum <int> 240064, 240064, 240064, 240064, 240064, 240064, 240…
## $ HocoFinal <chr> "Nelson County Cablevision Corp.", "Nelson County C…
## $ StateAbbr <chr> "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA…
## $ BlockCode <dbl> 5.112595e+14, 5.112595e+14, 5.112595e+14, 5.112595e…
## $ TechCode <int> 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50,…
## $ Consumer <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ MaxAdDown <dbl> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 100…
## $ MaxAdUp <dbl> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 100…
## $ Business <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ countycode <chr> "51125", "51125", "51125", "51125", "51125", "51125…
## $ Blkgr <chr> "511259502001", "511259502001", "511259502001", "51…
# Forming a block summary version
cvldat2 <- cvldat %>%
group_by(BlockCode) %>%
summarize(consproviders = sum(Consumer),
busproviders = sum(Business),
holdingcompany = length(unique(HoldingCompanyName[which(Consumer ==1)])),
avgMaxAdDown = mean(MaxAdDown),
avgMaxAdUp = mean(MaxAdUp)) %>%
mutate(Blkgr = str_sub(BlockCode, 1, 12))
cvldat2$under25.3mbps <- ifelse(cvldat2$avgMaxAdDown <25 & cvldat2$avgMaxAdUp <3, 1, 0)
sum(cvldat2$under25.3mbps)
## [1] 1254
sum(cvldat2$under25.3mbps) / length(unique(cvldat2$BlockCode))
## [1] 0.116825
There are 1,254 blocks with average max advertised broadband speeds of <25/3Mbs. That means that 11.68% of the Charlottesville region census blocks do not meet the FCC benchmark for having "advanced telecommunications capability".
Forming a catergorical variable for the FCC benchmarks for download speed.
cvldat2$downspeedcat <- ifelse(cvldat2$avgMaxAdDown <25, 1,
ifelse(cvldat2$avgMaxAdDown >=25 & cvldat2$avgMaxAdDown <50, 2,
ifelse(cvldat2$avgMaxAdDown >=50 & cvldat2$avgMaxAdDown <100, 3,
ifelse(cvldat2$avgMaxAdDown >= 100 & cvldat2$avgMaxAdDown <250, 4,
ifelse(cvldat2$avgMaxAdDown >= 250, 5, 0)))))
table(cvldat2$downspeedcat)
##
## 1 2 3 4 5
## 1409 4929 636 3328 432
Provider data
describe(cvldat2$consproviders)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10734 4.82 1.57 5 4.8 1.48 1 11 10 0.1 -0.16 0.02
mean(cvldat2$consproviders - cvldat2$holdingcompany)
## [1] 0.1302404
There are an average of ~5 internet providers in each Charlottesville region block. The average difference in the number of providers and holding companies is less than one.
# Forming block group summary version (need to account for the fact that some providers
# are repeated within the block group)
cvldatblkgr <- cvldat %>%
group_by(Blkgr) %>%
summarize(consproviders = length(unique(ProviderName[which(Consumer == 1)])),
busproviders = length(unique(ProviderName[which(Business == 1)])),
holdingcompany = length(unique(HoldingCompanyName[which(Consumer ==1)])),
avgMaxAdDown = mean(MaxAdDown),
avgMaxAdUp = mean(MaxAdUp))
cvldatblkgr$under25.3mbps <- ifelse(cvldatblkgr$avgMaxAdDown <25 & cvldatblkgr$avgMaxAdUp <3, 1, 0)
sum(cvldatblkgr$under25.3mbps)
## [1] 2
There are only two block groups with average max advertised broadband speeds of <25/3Mbps
cvldatblkgr2 <- cvldat2 %>%
group_by(Blkgr) %>%
summarize(numberunderservedblocks = sum(under25.3mbps))
cvldatblkgr <- merge(cvldatblkgr, cvldatblkgr2, by = "Blkgr", all.x = T)
describe(cvldatblkgr$numberunderservedblocks)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 155 8.09 16.82 2 4.8 2.97 0 148 148 5.07 34.3 1.35
Some block groups don't have any under-served blocks, while the max is 148.
Other summaries
cvldatblkgr %>% select(c(Blkgr, consproviders, busproviders, avgMaxAdDown, avgMaxAdUp, numberunderservedblocks)) %>%
pivot_longer(-Blkgr, names_to = "measure", values_to = "value") %>%
ggplot(aes(x = value, fill = measure)) +
scale_fill_viridis(option = "plasma", discrete = TRUE, guide = FALSE) +
geom_histogram() +
facet_wrap(~measure, scales = "free")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Mapping the block group summaries
shape <- readRDS("cville_blkgps.RDS")
shape$Blkgr <- shape$GEOID
mapdat <- merge(shape, cvldatblkgr, by = "Blkgr", all.x = T)
mapdat <- st_transform(mapdat, crs = 4326)
Number of consumer internet providers
pal <- colorNumeric("plasma", reverse = TRUE, domain = mapdat$consproviders) # viridis
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = mapdat,
fillColor = ~pal(consproviders),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(
weight = 2,
fillOpacity = 0.8,
bringToFront = T
),
popup = paste0("Block Group: ", shape$Blkgr, "<br>",
"Number of internet providers: ", mapdat$consproviders)
) %>%
addLegend("bottomright", pal = pal, values = mapdat$consproviders,
title = "Number of internet providers <br> per block group", opacity = 0.7)
Average maximum advertised download speeds
pal <- colorNumeric("plasma", reverse = TRUE, domain = mapdat$avgMaxAdDown) # viridis
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = mapdat,
fillColor = ~pal(avgMaxAdDown),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(
weight = 2,
fillOpacity = 0.8,
bringToFront = T
),
popup = paste0("Block Group: ", shape$Blkgr, "<br>",
"Average maximum advertised download speeds: ", round(mapdat$avgMaxAdDown, 2))
) %>%
addLegend("bottomright", pal = pal, values = mapdat$avgMaxAdDown,
title = "Average maximum advertised download <br> speeds by block group", opacity = 0.7)
Average maximum advertised upload speeds
pal <- colorNumeric("plasma", reverse = TRUE, domain = mapdat$avgMaxAdUp) # viridis
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = mapdat,
fillColor = ~pal(avgMaxAdUp),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(
weight = 2,
fillOpacity = 0.8,
bringToFront = T
),
popup = paste0("Block Group: ", shape$Blkgr, "<br>",
"Average maximum advertised upload speeds: ", round(mapdat$avgMaxAdUp, 2))
) %>%
addLegend("bottomright", pal = pal, values = mapdat$avgMaxAdUp,
title = "Average maximum advertised <br> upload speeds by block group", opacity = 0.7)
Number of blocks in each block group that don't meet the FCC threshold for "advanced telecommunications capability" (i.e. lower than 25/3 Mbps bandwidth)
pal <- colorNumeric("plasma", reverse = TRUE, domain = mapdat$numberunderservedblocks) # viridis
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = mapdat,
fillColor = ~pal(numberunderservedblocks),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(
weight = 2,
fillOpacity = 0.8,
bringToFront = T
),
popup = paste0("Block Group: ", shape$Blkgr, "<br>",
"Number of underserved blocks: ", mapdat$numberunderservedblocks)
) %>%
addLegend("bottomright", pal = pal, values = mapdat$numberunderservedblocks,
title = "Number of underserved blocks <br> by block group", opacity = 0.7)
dat <- read.csv("broadband_data_opendatachallenge.csv")
glimpse(dat)
## Rows: 32,608
## Columns: 23
## $ Zip <int> 29620, 29628, 29638, 29639, 7…
## $ Population <chr> "12934", "2759", "2944", "174…
## $ County <chr> "Abbeville", "Abbeville", "Ab…
## $ State <chr> "South Carolina", "South Caro…
## $ WiredCount_2020 <int> 6, 4, 6, 3, 4, 7, 6, 5, 3, 4,…
## $ Fwcount_2020 <int> 0, 0, 1, 0, 3, 3, 4, 4, 2, 4,…
## $ AllProviderCount_2020 <int> 11, 8, 13, 8, 12, 15, 16, 15,…
## $ Wired25_3_2020 <int> 5, 3, 4, 3, 2, 5, 3, 3, 1, 3,…
## $ Wired100_3_2020 <int> 3, 2, 4, 3, 2, 3, 3, 1, 1, 2,…
## $ All25_3_2020 <int> 7, 5, 6, 5, 4, 8, 6, 5, 3, 6,…
## $ All100_3 <int> 3, 2, 4, 3, 2, 3, 3, 1, 1, 2,…
## $ TestCount <int> 2536, 100, 272, 163, 49, 687,…
## $ AverageMbps <dbl> 212.50, 51.12, 82.79, 93.12, …
## $ FastestAverageMbps <dbl> 536.35, 126.06, 222.35, 223.7…
## $ X.Access.to.Terrestrial.Broadband <chr> "98%", "95%", "95%", "90%", "…
## $ Lowest.Priced.Terrestrial.Broadband.Plan <chr> "40", "40", "40", "40", "69.9…
## $ WiredCount_2015 <int> 6, 4, 5, 3, 4, 6, 5, 1, 2, 2,…
## $ Fwcount_2015 <int> 0, 0, 0, 0, 2, 2, 2, 1, 1, 1,…
## $ AllProviderCount_2015 <int> 8, 6, 7, 5, 8, 10, 9, 4, 5, 5…
## $ Wired25_3_2015 <int> 3, 3, 2, 3, 2, 3, 2, 0, 1, 1,…
## $ Wired100_3_2015 <int> 3, 3, 2, 3, 1, 2, 1, 0, 1, 1,…
## $ All25_3_2015 <int> 3, 3, 2, 3, 2, 3, 2, 0, 1, 1,…
## $ All100_3.1 <int> 3, 3, 2, 3, 1, 2, 1, 0, 1, 1,…
vadat <- dat[which(dat$State == "Virginia"),]
cvldat <- vadat[which(vadat$County == c("Albemarle", "Fluvanna", "Greene", "Louisa", "Nelson")),]
## Warning in vadat$County == c("Albemarle", "Fluvanna", "Greene", "Louisa", :
## longer object length is not a multiple of shorter object length
# The column for Cville data has no information.
(c(cvldat$County, cvldat$Lowest.Priced.Terrestrial.Broadband.Plan))
## [1] "Albemarle" "Albemarle" "Albemarle" "Fluvanna" "Greene" "Louisa"
## [7] "Nelson" "Nelson" "Nelson" "NULL" "79.99" "65"
## [13] "79.99" "79.99" "79.99" "79.99" "NULL" "NULL"